Older adults’ emotion recognition: No auditory-visual benefit for less clear expressions

The ability to recognise emotion from faces or voices appears to decline with advancing age. However, some studies have shown that emotion recognition of auditory-visual (AV) expressions is largely unaffected by age, i.e., older adults get a larger benefit from AV presentation than younger adults resulting in similar AV recognition levels. An issue with these studies is that they used well-recognised emotional expressions that are unlikely to generalise to real-life settings. To examine if an AV emotion recognition benefit generalizes across well and less well recognised stimuli, we conducted an emotion recognition study using expressions that had clear or unclear emotion information for both modalities, or clear visual, but unclear auditory information. Older (n = 30) and younger (n = 30) participants were tested on stimuli of anger, happiness, sadness, surprise, and disgust (expressed in spoken sentences) in auditory-only (AO), visual-only (VO), or AV format. Participants were required to respond by choosing one of 5 emotion options. Younger adults were more accurate in recognising emotions than older adults except for clear VO expressions. Younger adults showed an AV benefit even when unimodal recognition was poor. No such AV benefit was found for older adults; indeed, AV was worse than VO recognition when AO recognition was poor. Analyses of confusion responses indicated that older adults generated more confusion responses that were common between AO and VO conditions, than younger adults. We propose that older adults’ poorer AV performance may be due to a combination of weak auditory emotion recognition and response uncertainty that resulted in a higher cognitive load.


Introduction
In old age (e.g., over 70 years old), the ability to recognise facial or vocal expressions of emotion typically gets worse [1,2]. Emotion recognition is a core component of social cognition that underpins favourable social interaction. As such, determining the extent of this decline in emotion recognition, and the reasons for it, is important for understanding how older adults' social functioning may be impacted [3,4]. Poorer emotion recognition in older age has been explained in different ways. Some researchers have adopted a neurophysiological standpoint in which poorer emotion recognition is a consequence of age-related changes in particular brain regions [5,6]. Other proposals mainly consider cognitive, social, or motivational factors. For instance, it has been proposed that older adults are better at using effective strategies for emotion regulation [7].
The development of these theories of older adult emotion recognition is driven by discoveries about the factors that modulate older adults' recognition performance. A good example is the finding that older adults are not uniformly poor at recognising emotion; they tend to be better at recognising a positive expression like happy and worse at recognising a negative one like angry (the positivity effect, see [8] for a recent meta-analysis; for a discussion of the flip side of this effect, the negativity effect, see [9]). This finding plays a central role in prominent motivation-based theories that explain the positivity effect in terms of older adults prioritising goals to boost the prominence of emotionally gratifying information [10]. For theories that posit age-related degradation of brain function, the positivity effect has been explained in terms of older adults' weaker amygdala activation for emotional negative stimuli [5].
Another factor that has a clear influence on older adults' emotion recognition, which is the focus of the current study, concerns the way that the expressions are presented. Traditionally, most research on aging and emotion recognition has used unimodal (i.e., facial or vocal) expressions as stimuli. It is only relatively recently that studies have examined multimodal emotion recognition, e.g., in which emotional expressions are both seen and heard, in an effort to have more naturalistic stimuli [11][12][13]. Generally, results have shown that older adults do much better with multimodal presentation of emotions to the extent that their performance level can be similar to or even better than that shown by younger adults [12]. Finding that the level of older adult's multimodal emotion recognition of performance is like that of younger adults would most likely require a change in the theories to accommodate this. Moreover, it could be inferred from such results that older adults may not experience emotion recognition problems in real life where multimodal presentation of emotional expressions is likely to be common.
There are, however, several aspects of past studies that have used multimodal stimuli with older adults that motivate further investigation. The first concerns a general issue about the aim of previous studies. Most of these studies focussed simply on determining whether older adults receive a benefit from the combination of auditory and visual emotion information. As such they were less concerned with how such a multimodal advantage occurs.
The second related issue is that past studies have not examined older adults' use of AV emotion information for stimuli that do not portray high-agreement stereotypical depictions of emotional expressions. That is, because previous studies of older adults' multimodal emotion recognition largely focussed on demonstrating a multimodal benefit, they used stimuli that were conducive to showing such an effect. However, the stimuli used were not representative of the range of emotional expressions likely to be encountered in real life. For example, some studies have used a still picture combined with an auditory stimulus, rather than dynamic faces producing ongoing speech [11,14]. Also, some studies evaluated a very small number of emotions, e.g., only two [15]. Moreover, in many studies, the auditory and visual emotion stimuli were selected to be highly recognisable [11]. Not only might this be a problem for generalising the results to daily life, where a far greater range of stimuli would be expected (see [16] for an overview of the range of AV expressive signals), but the use of unambiguous stimuli (selected to have high label agreement levels) could reduce any difference between younger and older adults by producing ceiling effects. Indeed, studies that have used more naturalistic stimuli, have produced unclear results with respect to whether multimodal presentation boosts older adult emotion to younger adult levels. For example, Cortes and colleagues [13] found that unbiased hit-rates for younger and older adults were not significantly different for auditory and visual (AV) emotion presentations (t value = 1.95, p = .053), but a lack of statistical difference does not mean the performance was equivalent. Indeed, calculating the Bayes Factor (BF 01 ) for this contrast resulted in a value of 0.96, indicating that there was insufficient evidence to decide for or against a difference. Moreover, Cortes et al did not analyse how different emotions were recognised, noting the test used had only a small number of items per emotion and so was not suitable for a finer-grained analysis.
Given the above, the current study had two aims. First is to examine how older adults combine non-stereotypical auditory and visual emotion information; second, to investigate the process of AV integration in more depth than previous studies. To tackle the first aim, we tested older adults with AV expressive speech that varied in clarity of emotion information. It is well established that younger adults can use AV emotion information to achieve better performance than auditory only (AO) or visual only (VO) information alone (e.g., [17]). Moreover, it has been demonstrated that such an AV benefit even occurs when unimodal emotion information is not well recognised [18]. If older adults can get such an AV benefit, i.e., even when their recognition performance on one of the modalities is poor, then with multimodal stimuli, they should be able to achieve a recognition level closer to that of younger adults.
To investigate the process of AV integration in greater detail, we followed up Kim and Davis [18] by analysing the patterns of correct and incorrect (Note that, depending on the context, we refer to incorrect responses as confusions, competitors or non-target responses.) response data. To explain the AV benefit that young adults receive, Kim and Davis [18] examined the confusion matrices (emotions that were selected as the response when an error was made) and how the pattern of these in the unimodal (AO and VO) conditions changed in the bimodal (AV) one. In brief, they found that the confusion matrices for the different unimodal presentations of the same emotion tended to be non-overlapping, i.e., the confusions for a visually presented emotion tended not to be those made for an auditory presentation of that emotion. A confusion in the AV condition hardly ever occurred when it did not occur in both unimodal presentation conditions. Kim and Davis [18] interpreted this result in terms of how evidence for a specific emotion is evaluated from bimodal sources. For example, consider the case when "ANGRY" is presented audio-visually, it was suggested that when both sources include the same candidate emotion (e.g., "angry" in AO and VO modalities) then evidence for that interpretation is reinforced. When a candidate emotion is supported only one modality (e.g., "happy" just in AO, or "disgust" only in VO) then evidence for that emotion would be completely discounted. This evidence accumulation model explains how information from a poorly recognised expression in one modality can nevertheless boost bimodal recognition, i.e., the interpretation that is common to both modalities ("angry" in the above example) would be boosted, but unimodal confusions (unique to that modality) would have no influence.
If, with a poorly recognised unimodal emotion stimulus, older adults produced a broad range of confusions rather than the typical ones that younger adults make, then they may not get an AV benefit. That is, whereas for younger adults the confusion patterns tend to be nonoverlapping, this may break down for older adults if too many candidate analyses are generated. To make this clear, consider the angry example again. If for AO presentation, older adults not only mistook "happy" but also "disgust" for auditory "ANGRY", then, because visual only "ANGRY" is sometimes mistaken for "disgust", then with AV presentation, the disgust interpretation would be reinforced and not eliminated. This would lead to a larger number of viable interpretations, and this could potentially undermine any AV benefit. In this regard, it is important to analyse both correct and confusion responses to understand whether and how older adults get an AV benefit for stimuli that present unclear emotion information.
As mentioned above, a limitation of many previous studies on older adults' AV emotion recognition is that the stimuli used were selected to have high emotion recognition rates and so these stimuli would generate few alternative candidates [11,15,19]. As such, these stimuli cannot test how well the perceiver can use AV emotion information to winnow out candidate analyses. One study [20] did use realistic AV emotion portrayals that had both high and low recognition rates, and they found a decline in emotion recognition still occurred with increasing age. This result shows that the presentation of AV emotion information in itself does not close the gap between younger and older adult recognition performance. Interestingly, the study found that the emotions that were less well recognised showed the highest (negative) correlation between age and recognition performance, i.e., compared to younger adults, older adults had poorer emotion recognition for emotions that were not well recognised. However, as this study [20] did not use AO or VO stimuli, the results do not enable a quantification of AV benefit.
A recent study by De Boer and colleagues [21] has explicitly examined how older adults integrate auditory and visual emotion information when signals from either or both modalities have been degraded to reduce recognition performance. The study used naturalistic AV spoken emotion stimuli that presented both facial and body emotional gestures (i.e., the head and trunk of actors) from the core set of the Geneva Multimodal Emotion Portrayals (GEMEP, see [22]). De Boer and colleagues [21] found that the older participants showed approximately the same pattern of emotion recognition across degraded conditions as younger participants, i.e., older adults were as good as younger one at integrating audio and video to improve emotion recognition.
On the face of it, the De Boer et al study [21] appears to have answered the current research question about whether older adults integrate clear and less clear emotion information as well as younger adults do. However, there are two related reasons why the two studies are tackling different issues, and it is instructive to unpack these. The first reason is that [21] was specifically about the effect of sensory degradation on emotion recognition and how young and older adults compensate for this. The tacit assumptions here are that emotion renditions are typically unambiguous; and that older adults' emotion recognition problems are due to issues in peripheral sensory processing that result in the degradation of the emotion signals. In contrast, the issue that the current study addressed was how young and old adults recognise emotions when the emotion expressions themselves differ in their clarity. In using less clear emotion stimuli, we assume that in real life not all emotion renditions are clear and that younger and older adults may differ in how well they can combine AV information for clear and less clear expressions. The second reason why [21] does not address current concerns, is to do with the method used to produce poorer emotion recognition. The De Boer et al study [21] study did an excellent job in employing realistic methods of degrading of emotion signals; methods that simulate age-related sensory problems, i.e., eye fixation linked Gaussian blur of the image to simulate age-related macular degeneration and a hearing loss simulation for audio. However, these imposed degradations would be obvious to the participants, and would act as a signal to them to give less weight to the degraded information source.
Unlike [21], the current study is interested in the case where there is no obvious degradation upon which to base the weighing of information from the sources. As such, the emotion stimuli used in the current study were selected based on the recognition results of a previous study [18] so that there was an assortment of well recognised and less well recognised stimuli. The well recognised stimuli in [18] were mostly in the visual modality and by one presenter (henceforth the clear presenter, all over 90% correct), although Disgust, Sad and Happy, had high visual recognition rates for the other (unclear) presenter too (all above 80% correct). The auditory stimuli were less well recognised in [18]: for the clear presenter, only Disgust was poorly recognised (around 50% correct); for the unclear presenter, all of the emotional expressions except Sad were poorly recognised (40-63% correct). It is this mix of clear and unclear AV spoken emotion stimuli that provide the testing ground to determine how young and older adults integrate cross-modal emotion information.
In sum, the current study examined whether older adults would benefit from AV presentation when information in one modality was unclear and if so, how the size of the AV benefit for older adults would compare with that for younger Adults and how such a multimodal advantage might have been produced. Given the previous findings [19,20,23], it was predicted that older adults would achieve higher levels of emotion recognition with AV stimuli than with unimodal ones for stimuli that were well recognised in each modality (i.e., stimuli from the clear presenter). Furthermore, it was predicted that for these stimuli, older and younger adults would get a similar sized AV benefit for stimuli from the clear presenter. However, based on [20], we expected that the size of the AV benefit would be smaller in older adults for ambiguous expressions (i.e., stimuli from the unclear presenter).

Method
The study was approved by Western Sydney University Human Ethics Committee (H10938); and has been conducted according to the principles expressed in the Declaration of Helsinki. Upon arrival, each participant was given information on what the study involved (e.g., the task & duration, their right to withdraw, etc) and then informed consent was obtained in a written form.

Participants
Thirty younger adults (Mage = 20, SD = 2.4, 20 women) from Western Sydney University participated in this study for course credit. Thirty older adults (Mage = 72, SD = 6.1, 13 women) who were recruited from local community groups (e.g., senior computer clubs) participated for monetary reimbursement. Sample sizes were determined based on [19]. All participants had learnt English at age 7 or younger. Participants were given the Mini-Mental State Exam (MMSE) [24] as the presence of dementia has been associated with poor emotion processing [25]. Younger (M = 29) and older (M = 28) adults scored within the normal range (above 23), indicating no presence of dementia. Participants had no reported history of psychiatric disorders except three older adults: one who had former PTSD and was still on antidepressant/antianxiety medicine; one had sleep apnoea; and one had migraines/ headaches.

Materials
The stimuli were selected from [18] and consisted of audio and video recordings from two men, who were native Australian English presenters (in their early twenties) uttering 8 Semantically Unpredictable Sentences [26]. In uttering each sentence, the presenters were instructed to express anger, sadness, disgust, surprise, happiness, or neutral as if they were communicating this emotion to an observer. The total 32 (2 x 8 x 6) recordings were edited to produce AO, VO, and AV stimuli. A woman presenter's expressions were also selected for 12 practice trials.
As described previously, we selected stimuli from two presenters based on the average correct emotion recognition scores of 55 young participants from a previous study [18] (see Table 1). For the clear presenter, the emotion expressions were clearly recognisable (although this was not the case for Disgust in the AO condition). For the unclear presenter, the emotion expressions were poorly recognised in the auditory modality but produced a mixture of clear and unclear visual expressions.

Procedure
Participants were tested individually in a quiet room. They were told they would 'see', 'hear', or 'see and hear' a person conveying various emotions (while speaking) across trials. They were instructed to select one of 5 options on the screen that they thought best described the expressed emotion. As shown in Fig 1, for each trial, a fixation point was presented (50ms), followed by an experimental item (approx. 6 secs). Then, 5 boxes labelled "Angry", "Sad", "Disgust", "Surprise", and "Happy" were presented for a response (using the mouse to position the cursor over the selected option and pressing the mouse button). Participants were presented with 240 trials: three blocks (AO, VO, AV) of 40 trials (8 sentences, 5 emotions) for each of 2 presenters. Prior to each presenter block (i.e., AO, VO, AV), two neutral expressions of that presenter were displayed. Participants were told to use these expressions as a baseline to compare with subsequently presented emotions. Trials were blocked by presenter to reduce the difficulty of the task. That is, by blocking the presenter trials, as per [18], participants were given the opportunity to focus on a single presenters' expressions and to compare emotion expressions within that presenter block (to ultimately assist emotion recognition). Practice items were presented before experimental ones.
The order of the three blocks was counterbalanced across participants. Half of the participants received the clear presenter trials first for all three blocks. The presentation order of stimuli within each presenter block was randomised. For stimulus display and response collection, we used DMDX [27].
After the experiment, hearing and visual acuity were assessed since poor acuity may unduly affect the processing of acoustic and visual signals important for emotion identification. Hearing acuity was assessed by the first author (under the supervision of an experienced audiologist) for both ears using pure tone audiometry (Diagnostic Audiometer, AD229e) for 0.5, 1, 2, and 4 kHz. The Freiburg Visual Acuity Test (FrACT) [28] was used for both eyes and, if required, participants wore corrective glasses during the eye assessment. The MMSE was also

Statistical and additional analyses
Percent correct scores were calculated (using the Emmeans R package [29]) and were analysed with a repeated measures ANOVA with modality (AO, VO, AV), presenter (clear, unclear), and emotion (angry, sad, disgust, surprise, happy) as within-participant factors, and age as a between-participants factor (see S1 Appendix for the participant data). To examine the impact of expression clarity and age on the recognition of unimodal and bimodal expressions, separate repeated measures ANOVAs were conducted for the presenters with modality and emotion as within subject factors, and age as the between-subjects factor. Hearing level. Older and younger adults' hearing level scores (as estimated by pure tone audiometry) were averaged across the right and left ears. To explore whether older adults' hearing ability was associated with emotion recognition performance, a multiple regression analysis was performed between the recognition scores for the AO condition and hearing level for the four hearing frequencies. As per [30], we also separated the older adult participant group into two subgroups based on their hearing ability and compared them on the AO emotion recognition task.
Efficient AV integration. In addition to examining the possible influence of sensory factors in how well auditory and visual information are used in the selection of the correct response option, we also calculated a measure of AV integration efficiency based on the visual and auditory information by considering the pattern of errors for the visual and auditory only presentations. To gauge the efficiency to which information from the separate modalities was combined, we compared performance in the AV condition with what might be expected from the integration of the auditory and visual emotion information given by a multiplicative integration rule (essentially a Bayesian likelihood ratio). To do this, we used the full response patterns and followed [31] in which the predicted probability of a response was calculated based on the unimodal recognition performance. In this calculation, the combined AV support for the alternatives is considered; and ambiguous information (as indicated by different levels of support for auditory and visual recognition) is given less weight in the AV combination. So, for example, the probability of responding Happy given a happy auditory (A) presentation and a happy visual (V) one is given by Eq 1: The value obtained from this formula represents the optimal combination of the AV information given the confusions that have been made to the unimodal stimuli; it will be compared with actual AV performance to determine whether the participants are efficiently combining the A and AV sources of information. Number of candidate emotions selected: To further examine whether there was a difference between young and older adults in the efficiency of combining information from the visual and auditory modalities, we determined how many emotions other than the presented one (target) were selected for the VO, AO and AV presentations conditions (i.e., how many competitor candidates were selected). An analysis of response confusions is provided in the "Number of candidate emotions selected" section below. For now, it is worth pointing out that older adults have a broader spread of confusions (particularly for the AO presentations of the unclear presenter) and that for older adults, there are a greater number of confusions that are common to the AO and VO conditions. An analysis of response confusions is provided in the "Number of candidate emotions selected" section below. For now, it is worth pointing out that older adults have a broader spread of (particularly for the AO presentations of the unclear presenter) and that for older adults, there are a greater number of confusions that are common to the AO and VO conditions.

Emotion recognition accuracy
Overall correct emotion recognition scores for the younger and older adults are presented in Fig 3. For younger adults, AV scores were almost always higher than the highest unimodal score as per [18]. Conversely, older adults did not show improved AV scores relative to unimodal ones; indeed, for most cases, the AV scores for older adults were numerically lower than the VO ones. This result suggests that compared to younger adults, older adults were less adept at combining auditory and visual information.
There was a  The main effect of emotion, F(4,232) = 37.79, p < .001, ηp2 = .40, was also significant. There was a significant interaction between modality, emotion, and age group, F(8,464) = 3.00, p < .01, ηp2 = .05, and this was analysed further using a Bonferroni adjusted alpha.
Post hoc analyses revealed that for expressions of disgust, older adults showed worse performance for AV relative to VO expressions (p < .05). For the remaining expression types (i.e., angry, sad, surprise, happy) older adults showed no performance differences when these expressions were conveyed visually or audio-visually (p > .32 for all comparisons). Older adults recognised AV expressions better than AO expressions across all emotion types (p < .001). For younger adults, there were no differences across any emotions conveyed via VO and AV modalities (p > .27). This was also the case for AO expressions of anger and sadness where there were no performance differences between AO and AV renditions of these emotions (p > .41). Only expressions of happy, surprise, and disgust were less well recognised when conveyed via the AO modality than when conveyed via the AV modality (p < .01). These analyses also revealed that younger adults outperformed older adults for most of the AO expressions (p < .01) and AV emotional expressions (p < .02) except for AO expressions of surprise (p = .07) and AV expressions of happy (p = .27), where older and younger groups did not show a difference. There were no differences between older and younger percent correct scores for most of the emotions in the VO presentation modality (p > .16); except for angry where younger adults outperformed older ones (p < .01).
Post hoc analyses revealed that younger adults showed better recognition of emotions when presented with AV than VO expressions, followed by AO expressions (p < .02). Older adults had worse recognition in the AO compared to the AV condition (p < .001); but showed no significant differences when recognising emotions in the VO and AV conditions (p > .99). These analyses also indicated that younger adults outperformed older adults across all modality types (p < .001 for all comparisons).

Visual and hearing acuity and emotion recognition
Most participants had normal or corrected-to-normal visual acuity; except one younger and three older adults who had slightly worse than normal vision (a score of 1) with scores over 0.75. As can be seen in Fig 4, older adults' hearing level was poorer than younger adults across all frequencies. Younger adults had an averaged hearing level of 12dB across all frequencies (range = 10dB-15dB) whereas older adults averaged 32dB (range = 28dB-40dB).
A multiple regression analysis indicated that hearing levels were not significantly associated with performance for the AO emotion recognition task, adjusted R 2 = .1, F(4,29) = 2.1, p = .11. When separated into two groups based on hearing ability, one group, consisting of 12 participants, had hearing levels above 30dB for 500 or 1000Hz or above 40dB for 2000Hz. The remaining 18 participants had hearing levels equal to or below these thresholds. In line with [30] these two groups did not differ in their performance for the AO emotional expressions, F (1,29) = 1.99, p = .17, η p 2 = .07, providing further evidence to suggest that levels of hearing sensitivity were not associated with AO emotion recognition. Table 2 shows the calculated efficient integration scores (see the "Efficient AV integration" section above) and the actual AV emotion recognition scores. There are two important aspects of the table to note, the first is the overall level of information available as given by the efficient integration score; the second is whether the actual score was different from the efficient integration score (as indicated by the asterisk). For the clear presenter, younger adults' actual AV performance did not significantly differ from that estimated by the efficient integration calculation (and both were at ceiling). For the unclear presenter, the actual performance was on  average about 6% worse and significantly different for disgust, sad and happy. The older adults' actual AV recognition scores for the clear presenter were about 7% worse than what would be expected under efficient integration (which is at ceiling levels). For the unclear presenter, the actual recognition score was about 10% worse than the estimated optimal score, and the recognition scores for sad and happy were significantly lower than the efficient integration estimate.

Number of confusions (non-target competitor emotions selected)
The data in Table 3 shows that except the older adults' AO condition more competitors were selected in unclear than clear conditions (p < .001 for all). In general, older adults selected more competitors than did younger adults (p < .001 for all) and this was especially the case for AO presentation. It further shows that for older adults, presenting auditory emotion expressions in addition to visual ones (i.e., AV presentation) did not reduce the number of competitors compared to only having the visual (VO) presentation (p > .1 for both Clear and Unclear presenters) whereas this was the case for the younger adults (p = .012 for the Clear presenter; p < .001 for the Unclear presenter). The number of competitors common in AO and VO (i.e., the degree of overlapping confusions across modality) was greater for older than younger adults in both clear and unclear conditions (p < .001 for all). In the AV conditions, the number of these competitors decreased but the difference between the age groups was maintained (p < .001 for all contrasts).

Discussion
The current study investigated whether older adults would show improved emotion recognition performance for AV as compared to VO or AO presentations; and if so, whether this improvement would be of a similar size to (or even greater than) that shown by younger adults. We were particularly interested in whether older adults would show improved recognition performance for AV presentations even for stimuli where the expression in one modality was poorly recognised (as has been shown for younger adults in [18]). Whether or not older adults can efficiently combine auditory and visual emotion information to better recognise an unclear expression is relevant for gauging how well they might cope with the vagaries of reallife stimuli.
We begin by considering the results for expressions of the Unclear presenter, as this condition was a focus of the study. The results showed that overall, older adults' recognition performance was poorer than the younger adults (M = 15.2% worse). The difference between the age groups was largest for AO presentation (M = 20.3%) and smallest for VO presentation (M = 10.5%), with the difference for AV presentation in between (M = 14.7%). It is interesting that older adults' recognition scores with AV presentation would have been closer to those of the younger adults if they had simply ignored the concurrent auditory information (as was done by the older adults in the De Boer et al study [21] when the auditory information was artificially degraded). In terms of AV benefit, the younger adults had better emotion recognition for AV presentations relative to emotions presented for the best unimodal condition (typically VO expressions). This AV benefit for the Unclear presenter replicated the results of [18]. In contrast, older adults did not get an AV benefit. That is, for all expressions (except surprise), AV presentation did not improve recognition performance relative to VO performance. This latter result stands in contrast to those of previous studies that have found older adults were able to benefit from AV emotion presentation, and did to the same degree, or even more so, than younger adults [19,23].
A possible reason why older adults were unable to get an AV benefit is related to their poor AO emotion recognition. To make this point clear, we consider why older adults had poor AO emotion recognition and how this may have eliminated any AV benefit (and in one case even produced worse performance than VO presentation alone). Here, it should be pointed out that older adults' poor AO emotion recognition did not appear to be due to reduced hearing sensitivity per se, i.e., older adults' emotion recognition in AO condition was not correlated with their hearing acuity. This finding is consistent with [32] who showed that aiding older adult's hearing (with hearing aids) boosted their word recognition scores (almost double, unaided accuracy was 38.1% correct; aided accuracy, 65.1% correct) but had no significant effect on their emotion recognition accuracy scores (36.0% unaided, 41.8% aided) see also [33].
We propose that older adults poor AO emotion recognition was due to their lower sensitivity to emotion cues compared to younger adults. Evidence for this comes from research that has shown that older adults tend to be less sensitive to auditory cues for emotion. So, for example, Ben-David and colleagues [34] presented younger and older adults spoken sentences that conveyed emotion (anger, fear, happiness, sadness) or were neutral and had them rate how much they agree that the sentence conveyed the predefined emotion. They found that older compared to younger adults gave lower ratings when an emotion was present. Other studies have shown similar results, with older adults less sensitive to emotion cues such as intensity or arousal, than younger adults [33]. This insensitivity to emotion cues may have less of an effect on emotion recognition performance when such cues are pronounced. However, if emotion cues are weak, then older adults' insensitivity to emotion cues is likely to result in lower emotion recognition performance. Note that the speech of the Unclear presented has weaker cues to auditory emotion than the Clear presenter. In [35] Davis and Kim trained a classification model (logistic regression with a ridge estimator) on the utterances of the Clear and Unclear presenter respectively (as parametrised using the auditory attributes of the Interspeech 2009 emotion challenge [36]), then examined how the model could classify a hold-out set of stimuli from each presenter. It was found that the model trained on the Clear presenter's emotional speech had better classification performance than that trained on that of the Unclear presenter, indicating that the Unclear presenter produced weaker (or inconsistent) cues to the different emotions.
We suggest that older adults' insensitivity to emotion cues (especially to the emotional speech of Unclear presenter) had the combined effect of low AO emotion recognition and more response confusions. This is because the absence of a strong cue for an emotion could permit other interpretations of the input to be more viable. Under competitive activation-inhibition emotion models, e.g., [37], it is assumed that a strongly activated emotion representation inhibits other ones, whereas a weakly activated one allows more competitors to be active. The production of more unimodal response confusions could reduce AV emotion recognition scores in the following ways. First, more response confusions limit what can be achieved by integration. That is, poor unimodal recognition will limit what can be obtained by combining information sources (the Efficient score) because an increase in the number of confusion responses will increase the denominator of Eq 1 (see above) thus lowering the overall result. This is clear with Table 2 that shows a large difference in the Efficient integration scores between the younger and older adult groups (i.e., when comparing the younger and older adult scores in the row labelled "Efficient" in Table 2). The Efficient score represents the optimal integration of auditory and visual information (i.e., what could be obtained from the available information).
The second way that more unimodal confusions may affect AV emotion recognition scores is by limiting how well the AV integration process works. Here the effectiveness of AV integration is given by the difference between the Efficient score and the actual score. The results showed that older adults were less effective integrators than the younger adults (see Table 2). A possible reason for this is that older adults' weaker activation of emotion cues led to more emotion candidates being considered (see Table 3) and this placed greater demands on cognitive processing. Moreover, as analysis of the confusion responses indicated, older adults had more confusion responses that were common across the AO and VO conditions than younger adults, and such confusions seem harder to overcome than those that occur in only a single modality (e.g., [18]). As such, older adults need to rely more on executive function to overcome the consequences of weaker perceptual processing. This idea is consistent with the results of a recent brain imagining study [38]. This study used auditory and visual speech stimuli with younger and older adults and found that younger adults had greater activation of auditory (superior temporal gyrus) and visual (intra-calcarine cortex) sensory related cortical regions than older adults who, in turn, had greater relative activation of dorsal regions associated with executive function. Such executive based processing may be inefficient when many perceptual candidates (competitors) have been generated.
The results for the VO and AV expressions of the Clear presenter were effectively at ceiling for both older and younger adults, and so could not be used to assess AV benefit. The results for disgust, however, differed from the rest, since for this expression older adults' recognition performance was slightly worse when auditory information was added to the visual information. This occurred despite older adults' VO performance being high (not different from the other expressions). One reason why AV was worse than VO recognition may be due to older adults' very poor recognition of AO disgust (just 26% correct). Such a weak 'signal' for auditory disgust meant that not only did it provide little support for selecting "disgust" as a response, but that the selection and maintenance of other candidates would not have been inhibited by a clearly dominant interpretation. This lack of a dominant stimulus interpretation in the AO condition likely resulted in two things: The first is that AO confusions would potentially have reinforced VO ones; the second, is that an increased number of potential interpretations (as indexed by more confusions) would have increased cognitive load. Both factors would have led to poorer performance for AV stimuli.

Limitations and future directions
There are several limitations to this study. First, the MMSE was used to screen out older adults who had a decline in cognitive function. Although the MMSE is commonly used as a broad guide to cognitive functioning; it may not be that sensitive in detecting milder forms of cognitive impairment [39]. Second, in the current experiment older adults were presented with the emotional expressions of younger adults. The issue of whether older adults show an own age bias in recognising emotion is undecided, with some studies producing evidence for an own age bias for some emotions [40] and other studies indicating that the face emotion expression of older adults are difficult to identify regardless of age [41]. Nevertheless, given that older adults would spend more time interacting with other of a similar age, future research should consider using emotion expressions from older faces as well. Third, multimodal recognition for non-emotional stimuli could be assessed to determine if older adults have more general problems in combining auditory and visual information. In addition, future research could also evaluate the role of executive function in efficiently combining multimodal information.

Conclusion and implications
In sum, we found that for unclear emotion expressions, older adults unlike younger ones, did not get an AV recognition benefit. Indeed, for some emotions (e.g., disgust, and to some extent anger), AV recognition was worse than VO recognition, i.e., good VO recognition was compromised by the additional presentation of unclear AO emotion information. These findings have two main implications. First, older adults had worse unimodal emotion recognition than younger adults, and the lack of an older adult AV presentation benefit meant that with AV presentation they fell further behind the accuracy of younger adults. As such, our results do not require adjustments to theories that posit that older adults have worse emotion recognition than younger adults either due to age-related changes in motivation or brain function. Second, the multimodal emotion information typically available in everyday life, is likely not to assist older adults' emotion recognition, since such expressions are often fleeting and non-prototypical, and so may be like the unclear emotion presentations of the current study.
To the best of our knowledge, this is the first study to test the extent that clear and unclear auditory and visual emotion information is combined by older adults. The findings highlight the potential problems faced by older adults when recognising emotion during everyday life where stereotypic emotional expressions may not be the norm. This is an important issue, since emotion recognition problems experienced by older adults can impair effective communication. Understanding the aetiology of emotion recognition problems in older adults can lead to targeted interventions to improve emotion recognition ability, and, in turn, the wellbeing of older adults.
Supporting information S1 Appendix. Mean older/younger participant emotion recognition scores for the clear and unclear presenters across presentation modality and emotion. (XLSX)